Search CORE

673 research outputs found

Clustering Stability: An Overview

Author: von Luxburg Ulrike
Publication venue: 'Now Publishers'
Publication date: 01/01/2009
Field of study

A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are "most stable". In recent years, a series of papers has analyzed the behavior of this method from a theoretical point of view. However, the results are very technical and difficult to interpret for non-experts. In this paper we give a high-level overview about the existing literature on clustering stability. In addition to presenting the results in a slightly informal but accessible way, we relate them to each other and discuss their different implications

arXiv.org e-Print Archive

CiteSeerX

Crossref

MPG.PuRe

Kernel functions based on triplet comparisons

Author: Kleindessner Matthäus
von Luxburg Ulrike
Publication venue
Publication date: 01/01/2017
Field of study

Given only information in the form of similarity triplets "Object A is more similar to object B than to object C" about a data set, we propose two ways of defining a kernel function on the data set. While previous approaches construct a low-dimensional Euclidean embedding of the data set that reflects the given similarity triplets, we aim at defining kernel functions that correspond to high-dimensional embeddings. These kernel functions can subsequently be used to apply any kernel method to the data set

arXiv.org e-Print Archive

Publikationsserver der Universität Tübingen

MPG.PuRe

Shortest path distance in random k-nearest neighbor graphs

Author: Alamgir Morteza
von Luxburg Ulrike
Publication venue
Publication date: 01/01/2012
Field of study

Consider a weighted or unweighted k-nearest neighbor graph that has been built on n data points drawn randomly according to some density p on R^d. We study the convergence of the shortest path distance in such graphs as the sample size tends to infinity. We prove that for unweighted kNN graphs, this distance converges to an unpleasant distance function on the underlying space whose properties are detrimental to machine learning. We also study the behavior of the shortest path distance in weighted kNN graphs.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

A Tutorial on Spectral Clustering

Author: von Luxburg Ulrike
Publication venue
Publication date: 01/08/2006
Field of study

In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

Graph Laplacians and their convergence on random neighborhood graphs

Author: Audibert Jean-Yves
Hein Matthias
von Luxburg Ulrike
Publication venue
Publication date: 01/01/2007
Field of study

Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semi-supervised learning, dimensionality reduction and clustering. In this paper we determine the pointwise limit of three different graph Laplacians used in the literature as the sample size increases and the neighborhood size approaches zero. We show that for a uniform measure on the submanifold all graph Laplacians have the same limit up to constants. However in the case of a non-uniform measure on the submanifold only the so called random walk graph Laplacian converges to the weighted Laplace-Beltrami operator.Comment: Improved presentation, typos corrected, to appear in JML

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Consistent procedures for cluster tree estimation and pruning

Author: Chaudhuri Kamalika
Dasgupta Sanjoy
Kpotufe Samory
von Luxburg Ulrike
Publication venue
Publication date: 05/06/2014
Field of study

For a density

f

{\mathbb R}^d

, a {\it high-density cluster} is any connected component of

\{x: f(x) \geq \lambda\}

, for some

\lambda > 0

. The set of all high-density clusters forms a hierarchy called the {\it cluster tree} of

f

. We present two procedures for estimating the cluster tree given samples from

f

. The first is a robust variant of the single linkage algorithm for hierarchical clustering. The second is based on the

k

-nearest neighbor graph of the samples. We give finite-sample convergence rates for these algorithms which also imply consistency, and we derive lower bounds on the sample complexity of cluster tree estimation. Finally, we study a tree pruning procedure that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient

arXiv.org e-Print Archive

Princeton University Open Access Repository